Goto

Collaborating Authors

 robust aggregation


Reliable Graph Neural Networks via Robust Aggregation

Neural Information Processing Systems

Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node's neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50%. Our novel aggregation function, Soft Medoid, is a fully differentiable generalization of the Medoid and therefore lends itself well for end-to-end deep learning. Equipping a GNN with our aggregation improves the robustness with respect to structure perturbations on Cora ML by a factor of 3 (and 5.5 on Citeseer) and by a factor of 8 for low-degree nodes.


Private Aggregation for Byzantine-Resilient Heterogeneous Federated Learning

Egger, Maximilian, Bitar, Rawad

arXiv.org Machine Learning

Ensuring resilience to Byzantine clients while maintaining the privacy of the clients' data is a fundamental challenge in federated learning (FL). When the clients' data is homogeneous, suitable countermeasures were studied from an information-theoretic perspective utilizing secure aggregation techniques while ensuring robust aggregation of the clients' gradients. However, the countermeasures used fail when the clients' data is heterogeneous. Suitable pre-processing techniques, such as nearest neighbor mixing, were recently shown to enhance the performance of those countermeasures in the heterogeneous setting. Nevertheless, those pre-processing techniques cannot be applied with the introduced privacy-preserving mechanisms. We propose a multi-stage method encompassing a careful co-design of verifiable secret sharing, secure aggregation, and a tailored symmetric private information retrieval scheme to achieve information-theoretic privacy guarantees and Byzantine resilience under data heterogeneity. We evaluate the effectiveness of our scheme on a variety of attacks and show how it outperforms the previously known techniques. Since the communication overhead of secure aggregation is non-negligible, we investigate the interplay with zero-order estimation methods that reduce the communication cost in state-of-the-art FL tasks and thereby make private aggregation scalable.


Reviews: A Little Is Enough: Circumventing Defenses For Distributed Learning

Neural Information Processing Systems

In general, I like the question this paper asked, i.e., whether or not it is necessary to impose a large deviation from the model parameters in order to attack distributed learning. Most of the research in Byzantine tolerant distributed learning, including Krum, Bulyan, and Trimmed Mean, uses some statistically "robust aggregation" instead of simple mean at the PS to mitigate the effects of adversaries. By the nature of robust statistics, all of those methods takes positive answer to the above question as granted, which serves as a cornerstone for their correctness. Thus, the fact that this paper gives a negative answer is inspiring and may force researchers to rethink about whether or not robust aggregation is enough for Byzantine tolerant machine learning. However, the author seems not aware of DRACO (listed below), which is very different from the baselines considered in this paper.


Byzantine-Resilient Zero-Order Optimization for Communication-Efficient Heterogeneous Federated Learning

Egger, Maximilian, Bakshi, Mayank, Bitar, Rawad

arXiv.org Machine Learning

We introduce CyBeR-0, a Byzantine-resilient federated zero-order optimization method that is robust under Byzantine attacks and provides significant savings in uplink and downlink communication costs. We introduce transformed robust aggregation to give convergence guarantees for general non-convex objectives under client data heterogeneity. Empirical evaluations for standard learning tasks and fine-tuning large language models show that CyBeR-0 exhibits stable performance with only a few scalars per-round communication cost and reduced memory requirements.


Review for NeurIPS paper: Reliable Graph Neural Networks via Robust Aggregation

Neural Information Processing Systems

The time cost of the proposed aggregation function in practice is not given, though the worse case complexity is somehow analyzed, but practitioners may care more about the time, i.e., what's the time cost to finish one epoch training/(or one inference) compared to the time of vanilla models like GCN? also what's the time cost compare to other defense? In line 158, it's mentioned that soft medoid comes with the risk of a higher bias for small perturbations and high epsilon, the author should take this effect into account when conducting experiments, to better show the shortcomings out to readers. As found in the paper, this increased robustness against structure-based attacks comes with a cost of decreasing robustness on attribute attacks, the author should make it clear how much robustness lost to attribute attacks by using soft medoid aggregation, otherwise, the method just makes the model robust to structure attacks but super vulnerable to attribute inference attacks.


Review for NeurIPS paper: Reliable Graph Neural Networks via Robust Aggregation

Neural Information Processing Systems

This paper proposed a new robust aggregation function for graph neural networks to defend against adversarial attacks. The questions raised by the reviewers have been addressed properly in the rebuttal. However, one of the reviewers found that the theoretical analysis provided in this paper does not really prove the "adversarial robustness" of the proposed aggregation function. More specifically, the analysis only shows that an attacker is harder to turn the aggregated results into -\infty, while for adversarial robustness it is necessary to show "the aggregated results won't change significantly with small input perturbation". AC and other reviewers agree with this point but think this paper still has enough novelty and empirical contributions.


Reliable Graph Neural Networks via Robust Aggregation

Neural Information Processing Systems

Perturbations targeting the graph structure have proven to be extremely effective in reducing the performance of Graph Neural Networks (GNNs), and traditional defenses such as adversarial training do not seem to be able to improve robustness. This work is motivated by the observation that adversarially injected edges effectively can be viewed as additional samples to a node's neighborhood aggregation function, which results in distorted aggregations accumulating over the layers. Conventional GNN aggregation functions, such as a sum or mean, can be distorted arbitrarily by a single outlier. We propose a robust aggregation function motivated by the field of robust statistics. Our approach exhibits the largest possible breakdown point of 0.5, which means that the bias of the aggregation is bounded as long as the fraction of adversarial edges of a node is less than 50%.


Secure Byzantine-Robust Distributed Learning via Clustering

Velicheti, Raj Kiriti, Xia, Derek, Koyejo, Oluwasanmi

arXiv.org Artificial Intelligence

Federated learning systems that jointly preserve Byzantine robustness and privacy have remained an open problem. Robust aggregation, the standard defense for Byzantine attacks, generally requires server access to individual updates or nonlinear computation -- thus is incompatible with privacy-preserving methods such as secure aggregation via multiparty computation. To this end, we propose SHARE (Secure Hierarchical Robust Aggregation), a distributed learning framework designed to cryptographically preserve client update privacy and robustness to Byzantine adversaries simultaneously. The key idea is to incorporate secure averaging among randomly clustered clients before filtering malicious updates through robust aggregation. Experiments show that SHARE has similar robustness guarantees as existing techniques while enhancing privacy.


Byzantine-Robust Variance-Reduced Federated Learning over Distributed Non-i.i.d. Data

Peng, Jie, Wu, Zhaoxian, Ling, Qing

arXiv.org Machine Learning

We propose a Byzantine-robust variance-reduced stochastic gradient descent (SGD) method to solve the distributed finite-sum minimization problem when the data on the workers are not independent and identically distributed (i.i.d.). During the learning process, an unknown number of Byzantine workers may send malicious messages to the master node, leading to remarkable learning error. Most of the Byzantine-robust methods address this issue by using robust aggregation rules to aggregate the received messages, but rely on the assumption that all the regular workers have i.i.d. data, which is not the case in many federated learning applications. In light of the significance of reducing stochastic gradient noise for mitigating the effect of Byzantine attacks, we use a resampling strategy to reduce the impact of both inner variation (that describes the sample heterogeneity on every regular worker) and outer variation (that describes the sample heterogeneity among the regular workers), along with a stochastic average gradient algorithm (SAGA) to fully eliminate the inner variation. The variance-reduced messages are then aggregated with a robust geometric median operator. Under certain conditions, we prove that the proposed method reaches a neighborhood of the optimal solution with linear convergence rate, and the learning error is much smaller than those given by the state-of-the-art methods in the non-i.i.d. setting. Numerical experiments corroborate the theoretical results and show satisfactory performance of the proposed method.